Graph Grammar Based Analysis System of Complex Table Form Document

نویسندگان

  • Akira Amano
  • Naoki Asada
چکیده

Structure analysis of table form document is important because printed documents and also electronical documents only provide geometrical layout and lexical information explicitly. To handle these documents automatically, logical structure information is necessary. In this paper, we first propose a general representation of table form document based on XML, which contains both structure and layout information. Next, we present structure analysis system based on graph grammar which represents document structure knowledge. As the relation between adjacent fields in table form documents become two dimensional, two dimensional notation is necessary to denote structural knowledge. Therefore, we adopt two dimensional graph grammar to denote them. By using grammar notation, we can easily modify and keep consistency of it, as the rules are relatively simple. Another advantage of using grammar notation is that, it can be used for generating documents only from logical structure. Experimental results have shown that the system successfully analyzed several kinds of table forms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mapping of McGraw Cycle to RUP Methodology for Secure Software Developing

Designing a secure software is one of the major phases in developing a robust software. The McGraw life cycle, as one of the well-known software security development approaches, implements different touch points as a collection of software security practices. Each touch point includes explicit instructions for applying security in terms of design, coding, measurement, and maintenance of softwar...

متن کامل

Reliability estimation of Iran's power network

Today, the electricity power system is the most complicated engineering system has ever been made. The integrated power generating stations with power transmission lines has created a network, called complex power network. The reliability estimation of such complex power networks is a very challenging problem, as one cannot find any immediate solution methods in current literature. In this pape...

متن کامل

The Effect of Written Corrective Feedback on the Accuracy of Output Task and Learning of Target Form

The effect of error feedback on the accuracy of output task types such as editing task, text reconstruction task, picture cued writing task, and dictogloss task, has not been clearly explored. Following arguments concerning that the combination of both corrective feedback and output makes it difficult to determine whether their effects were in combination or alone, the purpose of the present st...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Mathematical formula recognition using graph grammar

This paper describes current results of Ofr (Optical Formula Recognition), a system for extracting and understanding mathematical expressions in documents. Such a tool could be really useful to be able to re-use knowledge in scientific books which are not available in electronic form. We currently also study use of this system for direct input of formulas with a graphical tablet for computer al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003